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Summary. We consider a semantic class, weakly-chase-sticky (WChS), and a 
syntactic subclass, jointly-weakly-sticky (JWS), of Datalog^ programs. Both ex¬ 
tend that of weakly-sticky (WS) programs, which appear in our applications to 
data quality. For WChS programs we propose a practical, polynomial-time query 
answering algorithm (QAA). We establish that the two classes are closed under 
magic-sets rewritings. As a consequence, QAA can be applied to the optimized 
programs. QAA takes as inputs the program (including the query) and seman¬ 
tic information about the “finiteness” of predicate positions. For the syntactic 
subclasses JWS and WS of WChS, this additional information is computable. 

Datalog^ . Datalog^ a rule-based language for query and view-definition in 
relational databases [5] , is not expressive enough to logically represent interesting 
and useful ontologies, at least of the kind needed to specify conceptual data 
models. Datalog^ extends Datalog by allowing existentially quantified variables 
in rule heads (3-variables), equality atoms in rule heads, and program constraints 
[2]. Hence the in Datalog^ , while the ” reflects syntactic restrictions on 
programs, for better computational properties. 

A typical Datalog^ program, 7^, is a finite set of rules, E U E U and an 
extensional database (finite set of facts) ^ D. The rules in E are tuple-gcncmting- 
dcpcndcncics (tgds) of the form 3xP{x,x') ^ Pi(xi),..., Pn(^n), where P C 
IJxi, and X can be empty. P is a set of equality-generating-dependencies {egds) of 
the form x = x' ^ Pi(^i), • • •, Pn{xn)^ with {x, x'} E\Jxi. Finally, N contains 
negative constraints of the form _L ^ Pi(^i), • • •, Pn{xn)^ where T is false. 

Example 1. The following Datalog^ program shows a tgd, an egd, and a neg¬ 
ative constraint, in this order: 3x Assist{y,x) ^ Doctor{y); x = x' ^ 
Assist{y,x), Assist{y,x'); E ^ Specialist{y,x, z), Nurse{y,z). □ 

Below, when we refer to a class of Datalog"^ programs, we consider only P, the 
tgds. Due to different syntactic restrictions, Datalog"^ can be seen as a class 
of sublanguages of Datalog ^, which is the extension of Datalog with tgds with 
3-variables m- 

The rules of a Datalog^ program can be seen as an ontology O on top of 
P, which can be incomplete. O plays the role of: (a) a “query layer” for P, 
providing ontology-based data access (OBDA) [9], and (b) the specification of a 
completion of P, usually carried out through the chase mechanism that, starting 
from P, iteratively enforces the rules in P, generating new tuples. This leads to 
a possibly infinite instance extending P, denoted with chase{E, D). 

The answers to a conjunctive query Q{x) from P wrt. P is a sequence of con¬ 
stants d, such that PUP ^ Q(d) (or yes or no in case Q is boolean). The answers 
can be obtained by querying as usual the universal instance chase{E, D). The 


chase may be infinite, which leads, in some cases, to undecidability of query an¬ 
swering [8]. However, in some cases where the chase is infinite, query answering 
(QA) is still computable (decidable), and even tractable in the size of D. Syn¬ 
tactic classes of Datalog^ programs with tractable QA have been identified and 
investigated, among them: sticky w, and weakly-sticky [4] Datalog^ programs. 


Our Need for QA Optimization. In our work, we concentrate on the stick¬ 
iness and weak-stickiness properties, because these programs appear in our ap¬ 
plications to quality data specification and extraction m, with the latter task 
accomplished through QA, which becomes crucial. 

Sticky programs [4] satisfy a syntactic restriction on the multiple occurrences 
of variables (joins) in the body of a tgd. Weakly-sticky (WS) programs form a 
class that extends that of sticky programs [4]. ^S-Datalog^ is more expressive 
than sticky Datalog^ , and results from applying the notion of weak-acyclicity 
(WA) as found in data exchange [6], to relax acyclicity conditions on stickiness. 
More precisely, in comparison with sticky programs, WS programs require a 
milder condition on join variables, which is based on a program’s dependency 
graph and the positions in it with finite rank [6] 0 

For QA, sticky programs enjoy first-order rewritahility [7], i.e. a conjunctive 
query Q posed to E [J D can be rewritten into a new first-order (FO) query 
Q', and correctly answered by posing Q' to and answering as usual. For WS 
programs, QA is PT/ME-complete in data, but the polynomial-time algorithm 
provided for the proof in [4] is not a practical one. 


Stickiness of the Chase. In addition to (syntactic) stickiness, there is a 
“semantic” property of programs, which is relative to the chase (and the data, 
P), and is called “chase-stickiness” (ChS). Stickiness implies semantic stickiness 
(but not necessarily the other way around) [4]. For chase-sticky programs, QA 
is tractable [4]. 

Intuitively, a program has the chase-stickiness property if, due to the appli¬ 
cation of a tgd cr: When a value replaces a repeated variable in the body of a 
rule, then that value also appears in all the head atoms obtained through the 
iterative enforcement of applicable rules that starts with cr. So, that value is 
propagated all the way down through all the possible subsequent steps. 


Assist ( d; bj) AssisJl b,4 ) 
Nursifbf) 


Specialis^(b,'p,Z) 
Doctor (c) 


Assist (dfij) Assis/lbf) 
Nursi(bf) 
Specialist b,p, Z) 


Fig. 1. The chase for a non-ChS program and the chase for a ChS program, resp. 


Example 2. Consider D = {Assist{a,b), Assist{b, c)}^ and the following set, 
i7i, of tgds: Nurse{y,z) ^ Assist{x,y), Assist{y, z); 3z Specialist{x^y^ z) ^ 

^ A position refers to a predicate attribute, e.g. Nurse[2]. 
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Nurse{x,y)] Doctor{y) ^ Specialist{x,y, z). Ui is not ChS, as the chase 
on the LHS of Figur^H shows: value b is not propagated all the way down to 
Doctor{c). However, program D 2 , which is Di without its third rule, is ChS, as 
shown on the RHS of Figur^H □ 

Weak-Stickiness of the Chase. Weak-stickiness also has a semantic version, 
called “weak-chase-stickiness” (WChS); which is implied by the former. So as 
for chase-stickiness, weak-chase-sticky programs have a tractable QA problem, 
even with a possibly infinite chase. This class is one of the two we introduce and 
investigate. They appear in double-edged boxes in Figure O with dashed edges 
indicating a semantic class. 

By definition, weak-chase-stickiness is obtained by relaxing the condition 
for ChS: it applies only to values for repeated variables in the body of a that 
appear in so-called infinite positions^ which are semantically defined. A position 
is infinite if there is an instance D for which an unlimited number of different 
values appear in Chase{D, D). 

Given a program, deciding if a position is infinite is unsolvable, so as decid¬ 
ing in general if the chase terminates. Consequently, it is also undecidable if a 
program is WChS. However, there are syntactic conditions on programs mm 
that determine some (but not necessarily all) the finite positions. For exam¬ 
ple, the notion of position rank, based on the program’s dependency graph, are 
used in mm to identify a (sound) set of finite positions, those with finite rank 
Furthermore, finite-rank positions are used in [4] to define weakly-sticky (WS) 
programs as a syntactic subclass of WChS. 

Finite Positions and Program Classes. In principle, any set-valued function 
S that, given a program, returns a subset of the program’s finite positions can 
be used to define a subclass WChS (S') of WChS. This is done by applying the 
definition of WChS above with “infinite positions” replaced by “non-S-finite 
positions”. Every class WChS(S) has a tractable QA problem. 

S could be computable on the basis of the program syntax or not. In the 
former case, it would be a “syntactic class”. Class WChS{S) grows monotoni- 
cally with S in the sense that if Si C S 2 (i.e. Si always returns a subset of the 
positions returned by S 2 ), then WChS {Si) C WChS { 82 )- In general, the more 
finite positions are (correctly) identified (and the consequently, the less finite 
positions are treated as infinite), the more general the subclass of WChS that is 
identified or characterized. 

For example, the function S^ that always returns an empty set of finite po¬ 
sitions, WChS{S^) is the class of sticky programs, because stickiness must hold 
no matter what the (in)finite positions are. At the other extreme, for function 
S^ that returns all the (semantically) finite positions, WChS{S^) becomes the 
class WChS. (As mentioned above, S~^ is in general uncomputable.) Now, if 
grank rg^ums the sct of finite-rank positions (for a program V, usually denoted 
by nF{V) [6]), WChS{S'^^^^) is the class of WS programs. 

Joint-Weakly-Stickiness. The joint-weakly-sticky (JWS) programs we intro¬ 
duce form a syntactic class strictly between WS and WChS. Its definition appeals 
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to the notions of joint-acyclicity and existential dependency graphs introduced 
in [12]. Figure [2] shows this syntactic class, and the inclusion relationships be¬ 
tween classes of Datalog^ programs 0 

If denotes the function that specifies finite positions on the basis of the 
existential dependency graphs (EDG), implicitly defined in [12], the JWS class is, 
by definition, the class WChS{S^^^). EDGs provide a finer mechanism for cap¬ 
turing (in)finite positions in comparison with positions ranks (defined through 
dependency graphs): C Gonsequently, the class of JWS programs, 

i.e. WChS{S^^^), is a strict superclass of WS programs, i.e. WChS{S'^^'^^)^ 

QAA for WChS. Our query answering algorithm for WGhS programs is pa¬ 
rameterized by a (sound) finite-position function S as above. It is denoted with 
AL^^ and takes as input query Q, and S{U), which is a subset of the 

program’s finite positions (the other are treated as infinite by default). 

The customized algorithm AL^ is guaranteed to be sound and complete only 
when applied to programs in WChS{S): AL^{E^ U, Q) returns all and only the 
query answers. (Actually, AL^ is still sound for any program in WGhS.) AL^ 
runs in polynomial-time in data; and can be applied to both the WS and the 
JWS syntactic classes. Eor them the finite-position functions are computable. 

AL^ is based on the concepts of parsimonious chase {p Chase) and freezing 
nulls, as used for QA with shy Datalog, a fragment of Datalog^ uni- At a p Chase 
step, a new atom is added only if a homomorphic atom is not already in the 
chase. Ereezing a null is promoting it to a constant (and keeping it as such in 
subsequent chase steps). So, it cannot take (other) values under homomorphisms, 
which may create new pChase steps. Resumption of the pChase means freezing 
all nulls, and continuing pChase until no more pChase steps are applicable. 

Query answering with shy programs has a first phase where the pChase runs 
until termination (which it does). In a second phase, the pChase iteratively re¬ 
sumes for a number of times that depends on the number of distinct 3-variables in 
the query. This second phase is required to properly deal with joins in the query. 
Our QAA for WGhS programs (AL) is similar, it has the same two phases, but a 
pChase step is modified: after every application of a pChase step that generates 
nulls, the latter that appear in S'-finite positions are immediately frozen. 

Magic-Sets Rewriting. It turns out that JWS, as opposed to WS, is closed 
under the quite general magic-set rewriting method [5] introduced in [1]. As a 
consequence, AL can be applied to both the original JWS program and its magic 
rewriting. (Actually, this also holds for the superclass WGhS.) 

^ Rectangles with dotted-edges show semantic classes, and double-edged rectangles 
show the classes introduced in this work. Notice that programs in semantic classes 
include the instance D, but syntactic classes are data-independent (for any instance 
as long as the syntactic conditions apply). 

^ The JWS class is different from (and incomparable with) the class of w eakly-sticky- 
join programs (WSJ) introduced in [3], which extends the one of WS programs with 
consideration that are different from those used for JWS programs. WSJ generalizes 
WS on the basis of the weakly-sticky-join property of the chase and is related 

to repeated variables in single atoms. 
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Fig. 2. Generalization relationships among program classes. 

It can be proved that (our modification of) the magic-sets rewriting method 
in [T] does not change the character of the original finite or infinite positions. 
The specification of (in)finiteness character of positions in magic predicates is not 
required by AL^ because no new nulls appear in them during the AL execution. 
As consequence, the MS method rewriting can be perfectly integrated with our 
QAA, introducing additional efficiency. 
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